86 research outputs found

    Alpha-divergences pour la segmentation d'images par contours actifs basés histogrammes : Application à l'analyse d'images médicales et biomédicales

    Get PDF
    33 pages, soumis à la revue "Traitement du Signal"Cet article présente une méthode de segmentation par contours actifs basés histogramme intégrant comme mesure de similarité la famille particulière des alpha-divergences. L'intérêt principal de cette méthode réside (i) dans la flexibilité des alpha-divergences dont la métrique intrinsèque peut-être paramétrisée via la valeur de alpha et donc adaptée aux distributions statistiques des régions de l'image à segmenter ; et (ii) dans la capacité unificatrice de cette mesure statistique vis-à-vis des distances classiquement utilisées dans ce contexte (Kullback- Leibler, Hellinger...). Nous abordons l'étude de cette mesure statistique tout d'abord d'un point de vue supervisé pour lequel le processus itératif de segmentation se déduit de la minimisation de l'alpha -divergence entre la densité de probabilité courante et une référence définie manuellement. Puis nous nous focalisons sur le point de vue non supervisé qui permet de se dédouaner de l'étape de définition des références par le biais d'une maximisation de distance entre les densités de probabilités intérieure et extérieure au contour. Par ailleurs, nous proposons une démarche d'optimisation de l'évolution du paramètre alpha conjointe au processus d'extrémisation de la divergence, permettant d'adapter itérativement la divergence à la statistique des données considérées. Au niveau expérimental, nous proposons une étude comparée des différentes approches de segmentations : en premier lieu, sur des images synthétiques bruitées et texturées, puis, sur des images naturelles. Enfin, nous focalisons notre étude sur différentes applications issues des domaines biomédicaux (microscopie confocale cellulaire) et médicaux (radiographie X) dans le contexte de l'aide au diagnotic. Dans chacun des cas, une discussion sur l'apport des alpha-divergences est proposée

    From text saliency to linguistic objects: learning linguistic interpretable markers with a multi-channels convolutional architecture

    Get PDF
    A lot of effort is currently made to provide methods to analyze and understand deep neural network impressive performances for tasks such as image or text classification. These methods are mainly based on visualizing the important input features taken into account by the network to build a decision. However these techniques, let us cite LIME, SHAP, Grad-CAM, or TDS, require extra effort to interpret the visualization with respect to expert knowledge. In this paper, we propose a novel approach to inspect the hidden layers of a fitted CNN in order to extract interpretable linguistic objects from texts exploiting classification process. In particular, we detail a weighted extension of the Text Deconvolution Saliency (wTDS) measure which can be used to highlight the relevant features used by the CNN to perform the classification task. We empirically demonstrate the efficiency of our approach on corpora from two different languages: English and French. On all datasets, wTDS automatically encodes complex linguistic objects based on co-occurrences and possibly on grammatical and syntax analysis.Comment: 7 pages, 22 figure

    Linear kernel combination using boosting

    Get PDF
    International audienceIn this paper, we propose a novel algorithm to design multi- class kernels based on an iterative combination of weak kernels in a schema inspired from the boosting framework. Our solution has a complexity lin- ear with the training set size. We evaluate our method for classification on a toy example by integrating our multi-class kernel into a kNN clas- sifier and comparing our results with a reference iterative kernel design method. We also evaluate our method for image categorization by con- sidering a classic image database and comparing our boosted linear kernel combination with the direct linear combination of all features in a linear SVM

    Named Entity Recognition using Neural Networks for Clinical Notes

    Get PDF
    International audienceCurrently, the best performance for Named Entity Recognition in medical notes is obtained by systems based on neural networks. These supervised systems require precise features in order to learn well fitted models from training data, for the purpose of recognizing medical entities like medication and Adverse Drug Events (ADE). Because it is an important issue before training the neural network, we focus our work on building comprehensive word representations (the input of the neural network), using character-based word representations and word representations. The proposed representation improves the performance of the baseline LSTM. However, it does not reach the performances of the top performing contenders in the challenge for detecting medical entities from clinical notes.Actuellement, la meilleure performance pour la reconnaissance de l'entité nommée dans les notes médicales est obtenue par des systèmes basés sur des réseaux de neurones. Ces systèmes supervisés nécessitent des caractéristiques précises afin d'apprendre des modèles bien ajustés à partir des données de formation, dans le but de reconnaître les entités médicales comme les médicaments et les événements indésirables liés aux médicaments (EIM). Parce qu'il s'agit d'une question importante avant la formation du réseau neuronal, nous concentrons notre travail sur la construction de représentations complètes de mots (l'entrée du réseau neuronal), en utilisant des représentations de mots basés sur des caractères et des représentations de mots. La représentation proposée améliore la performance de la LSTM de référence. Cependant, il n'atteint pas les performances des concurrents les plus performants dans le challenge de détection d'entités médicales à partir de notes cliniques

    Machine Learning under the light of Phraseology expertise: use case of presidential speeches, De Gaulle -Hollande (1958-2016)

    Get PDF
    International audienceAuthor identification and text genesis have always been a hot topic for the statistical analysis of textual data community. Recent advances in machine learning have seen the emergence of machines competing state-of-the-art computational linguistic methods on specific natural language processing tasks (part-of-speech tagging, chunking and parsing, etc). In particular, Deep Linguistic Architectures are based on the knowledge of language speci-ficities such as grammar or semantic structure. These models are considered as the most competitive thanks to their assumed ability to capture syntax. However if those methods have proven their efficiency, their underlying mechanisms, both from a theoretical and an empirical analysis point of view, remains hard both to explicit and to maintain stable, which restricts their area of applications. Our work is enlightening mechanisms involved in deep architectures when applied to Natural Language Processing (NLP) tasks. The Query-By-Dropout-Committee (QBDC) algorithm is an active learning technique we have designed for deep architectures: it selects iteratively the most relevant samples to be added to the training set so that the model is improved the most when built from the new training set. However in this article, we do not go into details of the QBDC algorithm-as it has already been studied in the original QBDC article-but we rather confront the relevance of the sentences chosen by our active strategy to state of the art phraseology techniques. We have thus conducted experiments on the presidential discourses from presidents C. De Gaulle, N. Sarkozy and F. Hollande in order to exhibit the interest of our active deep learning method in terms of discourse author identification and to analyze the extracted linguistic patterns by our artificial approach compared to standard phraseology techniques.L'identification de l'auteur et la gen ese d'un texte ont toujours eté une question de tr es grand intérêt pour la com-munauté de l'analyse statistique des données textuelles. Les récentes avancées dans le domaine de l'apprentissage machine ont permis l'´ emergence d'algorithmes concurrençant les méthodes de linguistique computationnelles de l'´ etat de l'art pour des tâches spécifiques en traitement automatique du langage (´ etiquetage des parties du dis-cours, segmentation et l'analyse du texte, etc). En particulier, les architectures profondes pour la linguistique sont fondées sur la connaissance des spécificités linguistiques telles que la grammaire ou la structure sémantique. Ces mod eles sont considérés comme les plus compétitifs grâcè a leur capacité supposée de capturer la syntaxe. Toute-fois, si ces méthodes ont prouvé leur efficacité, leurs mécanismes sous-jacents, tant du point de vue théorique que du point de vue de l'analyse empirique, restent difficilè a la fois a expliciter et a maintenir stables, ce qui limite leur domaine d'application. Notre article visè a mettre enlumì ere certains des mécanismes impliqués dans l'apprentissage profond lorsqu'il est appliqué a des tâches de traitement automatique du langage (TAL). L'algorithme Query-By-Dropout-Committee (QBDC) est une technique d'apprentissage actif, nous avons conçu pour les architectures profondes : il sélectionne itérativement les echantillons les plus pertinents pour etre ajoutés a l'ensemble d'entrainement afin que le mod ele soit amélioré de façon optimale lorsqu'on il est mis a jour a partir du nouvel ensemble d'entrainement. Cependant, dans cet article, nous ne détaillons pas l'algorithme QBDC-qui a déj a ´ eté etudié dans l'article original sur QBDC-mais nous confrontons plutôt la pertinence des phrases choisies par notre stratégie active aux techniques de l'´ etat de l'art en phraséologie. Nous avons donc mené des expériences sur les discours présidentiels des présidents C. De Gaulle , N. Sarkozy et F. Hollande afin de présenter l' intérêt de notre méthode d'apprentissage profond actif en termes de d'identification de l'auteur d'un discours et pour analyser les motifs linguistiques extraits par notre approche artificielle par rapport aux techniques de phraséologie standard

    Suivi 3D Monoculaire pour un Système de Vidéosurveillance à l'aide d'un Modèle de Mouvement et un Modèle d'Apparence

    Get PDF
    Session "Atelier VISAGES"ISBN : 978-2-9539515-2-3National audienceLe besoin en méthodes non intrusives d'analyse des mouvements humains se fait sentir à travers des applications comme la vidéosurveillance intelligente, les interfaces homme-machine et l'indexation multimédia. Dans cet article, nous proposons une approche générative se basant sur un filtre particulaire à recuit simulé (APF) : une fonction de vraisemblance qui combine des mesures basées sur les silhouettes et sur l'apparence, et un modèle temporel se basant sur une réduction de l'espace des poses pour une activité donnée. Le filtre proposé permet d'estimer en ligne la vitesse de marche ainsi que les coordonnées du cycle dans l'espace réduit. Nous évaluons l'approche proposée sur la base de données HumanEva. Les résultats du suivi montrent que la fonction de vraisemblance mixte réduit l'erreur 3D. Le modèle temporel proposée permet d'améliorer le suivi tout en réduisant le coût calculatoire du filtre particulaire

    Hierarchical Multimodal Attention for Deep Video Summarization

    Get PDF
    International audienceThe way people consume sports on TV has drastically evolved in the last years, particularly under the combined effects of the legalization of sport betting and the huge increase of sport analytics. Several companies are nowadays sending observers in the stadiums to collect live data of all the events happening on the field during the match. Those data contain meaningful information providing a very detailed description of all the actions occurring during the match to feed the coaches and staff, the fans, the viewers, and the gamblers. Exploiting all these data, sport broadcasters want to generate extra content such as match highlights, match summaries, players and teams analytics, etc., to appeal subscribers. This paper explores the problem of summarizing professional soccer matches as automatically as possible using both the aforementioned event-stream data collected from the field and the content broadcasted on TV. We have designed an architecture, introducing first (1) a Multiple Instance Learning method that takes into account the sequential dependency among events and then (2) a hierarchical multimodal attention layer that grasps the importance of each event in an action. We evaluate our approach on matches from two professional European soccer leagues, showing its capability to identify the best actions for automatic summarization by comparing with real summaries made by human operators

    Generalised Mutual Information: a Framework for Discriminative Clustering

    Full text link
    In the last decade, recent successes in deep clustering majorly involved the Mutual Information (MI) as an unsupervised objective for training neural networks with increasing regularisations. While the quality of the regularisations have been largely discussed for improvements, little attention has been dedicated to the relevance of MI as a clustering objective. In this paper, we first highlight how the maximisation of MI does not lead to satisfying clusters. We identified the Kullback-Leibler divergence as the main reason of this behaviour. Hence, we generalise the mutual information by changing its core distance, introducing the Generalised Mutual Information (GEMINI): a set of metrics for unsupervised neural network training. Unlike MI, some GEMINIs do not require regularisations when training as they are geometry-aware thanks to distances or kernels in the data space. Finally, we highlight that GEMINIs can automatically select a relevant number of clusters, a property that has been little studied in deep discriminative clustering context where the number of clusters is a priori unknown.Comment: Submitted for review at the IEEE Transactions on Pattern Analysis and Machine Intelligence. This article is an extension of an original NeurIPS 2022 article [arXiv:2210.06300

    From text saliency to linguistic objects: learning linguistic interpretable markers with a multi-channels convolutional architecture

    Get PDF
    A lot of effort is currently made to provide methods to analyze and understand deep neural network impressive performances for tasks such as image or text classification. These methods are mainly based on visualizing the important input features taken into account by the network to build a decision. However these techniques, let us cite LIME, SHAP, Grad-CAM, or TDS, require extra effort to interpret the visualization with respect to expert knowledge. In this paper, we propose a novel approach to inspect the hidden layers of a fitted CNN in order to extract interpretable linguistic objects from texts exploiting classification process. In particular, we detail a weighted extension of the Text Deconvolution Saliency (wTDS) measure which can be used to highlight the relevant features used by the CNN to perform the classification task. We empirically demonstrate the efficiency of our approach on corpora from two different languages: English and French. On all datasets, wTDS automatically encodes complex linguistic objects based on co-occurrences and possibly on grammatical and syntax analysis

    DiagnoseNET: Automatic Framework to Scale Neural Networks on Heterogeneous Systems Applied to Medical Diagnosis

    Get PDF
    International audienceDetermine an optimal generalization model with deep neu-ral networks for a medical task is an expensive process that generally requires large amounts of data and computing power. Furthermore, scale deep learning workflows over a wide range of emerging heterogeneous system architecture increases the programming expressiveness complexity for model training and the computing orchestration. We introduce Diag-noseNET, a programming framework designed for scaling deep learning models over heterogeneous systems applied to medical diagnosis. It is designed as a modular framework to enable the deep learning workflow management and allows the expressiveness of neural networks written in TensorFlow, while its runtime abstracts the data locality, micro batch-ing and the distributed orchestration to scale the neural network model from a GPU workstation to multi-nodes. The main approach is composed through a set of gradient computation modes to adapt the neural network according to the memory capacity, the workers' number, the coordination method and the communication protocol (GRPC or MPI) for achieving a balance between accuracy and energy consumption. The experiments carried out allow to evaluate the computational performance in terms of accuracy, convergence time and worker scalability to determine an optimal neural architecture over a mini-cluster of Jetson TX2 nodes. These experiments were performed using two medical cases of study, the former dataset is composed by clinical descriptors collected during the first week of hospitalization of patients in the Provence-Alpes-CĂ´te d'Azur region; the second dataset uses a short ECG records between 30 and 60 seconds, obtained as part of the PhysioNet 2017 Challenge
    • …
    corecore